Authorship Attribution of Texts: A Review

نویسنده

  • Mikhail B. Malyutov
چکیده

We study the authorship attribution of documents given some prior stylistic characteristics of the author’s writing extracted from a corpus of known works, e.g., authentication of disputed documents or literary works. Although the pioneering paper based on word length histograms appeared at the very end of the nineteenth century, the resolution power of this and other stylometry approaches is yet to be studied both theoretically and on case studies such that additional information can assist finding the correct attribution. We survey several theoretical approaches including ones approximating the apparently nearly optimal one based on Kolmogorov conditional complexity and some case studies: attributing Shakespeare canon and newly discovered works as well as allegedly M. Twain’s newly-discovered works, Federalist papers binary (Madison vs. Hamilton) discrimination using Naive Bayes and other classifiers, and steganography presence testing. The latter topic is complemented by a sketch of an anagrams ambiguity study based on the Shannon cryptography theory. 1. Micro-style analysis

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Lost in Translation: Authorship Attribution using Frame Semantics

We investigate authorship attribution using classifiers based on frame semantics. The purpose is to discover whether adding semantic information to lexical and syntactic methods for authorship attribution will improve them, specifically to address the difficult problem of authorship attribution of translated texts. Our results suggest (i) that frame-based classifiers are usable for author attri...

متن کامل

Authorship Attribution with Topic Models

Authorship attribution deals with identifying the authors of anonymous texts. Traditionally, research in this field has focused on formal texts, such as essays and novels, but recently more attention has been given to texts generated by on-line users, such as e-mails and blogs. Authorship attribution of such on-line texts is a more challenging task than traditional authorship attribution, becau...

متن کامل

More than Word Frequencies: Authorship Attribution via Natural Frequency Zoned Word Distribution Analysis

With such increasing popularity and availability of digital text data, authorships of digital texts can not be taken for granted due to the ease of copying and parsing. This paper presents a new text style analysis called natural frequency zoned word distribution analysis (NFZ-WDA), and then a basic authorship attribution scheme and an open authorship attribution scheme for digital texts based ...

متن کامل

Explaining Delta, or: How do distance measures for authorship attribution work?

Authorship Attribution is a research area in quantitative text analysis concerned with attributing texts of unknown or disputed authorship to their actual author based on quantitatively measured linguistic evidence (see Juola 2006; Stamatatos 2009; Koppel et al. 2009). Authorship attribution has applications in literary studies, history, forensics and many other fields, e.g. corpus stylistics (...

متن کامل

Syntactic Stylometry: Using Sentence Structure for Authorship Attribution

Most approaches to statistical stylometry have concentrated on lexical features, such as relative word frequencies or type-token ratios. Syntactic features have been largely ignored. This work attempts to fill that void by introducing a technique for authorship attribution based on dependency grammar. Syntactic features are extracted from texts using a common dependency parser, and those featur...

متن کامل

Authorship Attribution of Micro-Messages

Work on authorship attribution has traditionally focused on long texts. In this work, we tackle the question of whether the author of a very short text can be successfully identified. We use Twitter as an experimental testbed. We introduce the concept of an author’s unique “signature”, and show that such signatures are typical of many authors when writing very short texts. We also present a new...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005